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PREFACE 



Project URBAN DOC is reporting on four years of activity as an Urban Renewal 
Demonstration Project at The City University of New York. The project evolved from a 
need for improving bibliographic services in urban affairs - and specifically urban renewal 
- at a time when computer technology was being incorporated into a wide range of 
information systems. URBANDOC was one of the first of the library -information science 
systems to deal specifically with the social sciences. 

The final report consists of three volumes: the Demonstration Report, the Genera / 
Manual (Technical Supplement 1), and the Operations Manual (Technical Supplement 2). 
Each of these is bound separately and intended for separate distribution. For the most 
general reader who wishes an over all view of the objectives, features, accomplishments, 
and conclusions and recom nendations of the project, the Demonstration Report should 
suffice. 

The General Manual is designed to provide the reader with detailed knowledge of the 
techniques developed for handling the documents according to library-information 
science practices as developed by Project URBANDOC. While it also provides an overview 
of the programming system used by the project, the Operations Manual should be 
consulted for detailed systems analysis, programming, and operations data. 

The U.S. Department of Housing and Urban Development has been most generous in its 
assistance of Project URBANDOC, from project submission to final report. HUD's 
committment to the Demonstration was as important conceptually as it was econ- 
omically, and the University's indebtedness is thus two-fold. The President and 
Deans of the University Graduate Division join the New York City Planning Commission 
and the URBANDOC staff in thanking the Department for having made possible each 
of these three final volumes, as well as the entire project. 
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INTRODUCTION 
Systems Background 

The Genera / Manual states the objectives of the computer component of the 
URBANDOC project as: 

To buiid and maintain a thesaurus in machine-readable form; 

To build and maintain a document master file of bibliographic records 
containing both the content and descriptive analyses in machine-readable 
form; 

To organize the machine-readable files for retrieval and publications purposes; 

To establish search programs to query the document master file; 

To establish publications programs to produce listings and indexes to these 
listings. 

The system realizing these objectives consists of five subsystems or modules, a module 
being defined as all the programs required to meet a specific objective. The modules as 
finally developed are Pre-edit, Thesaurus, File Maintenance, Search and Publications. The 
programs included in each module came from a variety of sources: URBANDOC 
programs, the IBM Program Library, and the Engineering Index.. Each source and the role 
it played will be discussed in detail. 

The core of the system was a set of computer programs from the IBM Program Library, 
the Combined File Search System, often referred to as CFS. It seemed to fulfil) the 
majority of URBANDOC'S requirements in the areas of thesaurus, file maintenance and 
search, as well as allow for a gradual expansion to a total systems approach. This set of 
programs was designed for use on the IBM 1401 computer. {For devils of configuration, 
see section following on Computer Equipment.} 

The Pre-edit and Publications Modules were completely developed and implemented by 
URBANDOC. By adding several programs which were not part of the system as 
distributed by the IBM Program Library, URBANDOC enlarged the nature of the 
Thesaurus Module and File Maintenance Module. The Thesaurus Mod 1 'e also included 
another program from the IBM Program Library, the one which permutes the Thesaurus. 
The Search Module was expanded by the incorporation of the search subsystem 
acquired from the Engineering Index. Throughout the entire system, sorting of the data 
files is performed by SORT7, another program from the IBM Program Library. 

The source of a program has a direct bearing on the issues of support and maintenance. 
While URBANDOC was responsible for supporting the programs in its system, the degree 
of effort expended was directly related to the source of the program. URBANDOC was 
best able to maintain the programs it had created. The maintenance of the programs from 
the Engineering Index and the IBM Program Library involved other factors, namely that 
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as users rather than primary developers, the project was less familiar with the detailed 
workings of each program and, in some instances, less able to effectuate all changes. 

The programs from the IBM Program Library were classified as "Type III programs". 
These were programs donated to the library by their corporate developers which were 
distributed on request to users of IBM equipment. IBM had no maintenance 
committment for these systems. The documentation accompanying the programs usually 
included the names of individual authors who could be contacted in case of problems or 
questions. Of course, such programs as SORT7, the COBOL compiler, and the AUTO- 
CODER assembler were Type I programs which were completely supported by IBM. 

The task of program maintenance for permuting the Thesaurus was an issue that never 
arose; URBANDOC did not encounter operating difficulties with that particular program. 
The situation in the case of the CFS programs was entirely different. URBANDOC 
resolved many of the issues by itself and a CFS Users' Group was formed to cooperatively 
solve the remaining issues. (It is interesting to note that approximately seven hundred 
changes were made to the CFS system with an estimated average of two man-hours 
required per change.) 

With regard to the Engineering Index programs, some minor format changes were made to 
produce retrieval output in agreement with URBANDOC terms rather than the ones used 
by the Engineering Index. In other instances, the Engineering Index did provide 
URBANDOC with assistance in learning to use the search subsystem to its best advantage. 
In this respect, the assistance was more a matter of training rather than resolving 
programming problems. 

With regard to prospective support and maintenance of the various programs, none of the 
previous arrangements can be relied upon for assistance. Engineering Index is no longer 
actively supporting these programs; in some instances it may be possible to make separate 
arrangements with them for additional support and/or maintenance. The persons 
originally named on programs in the IBM Program Library have long since lost intimate 
familiarity with these particular programs and have perhaps moved to other projects, 
organizations, etc. URBANDOC cannot provide future support for the system unless 
special arrangements are made with The City University Graduate Division. 

Computer Equipment 

URBANDOC has been operating its system on the IBM 1401 computer at the Baruch 
College of The City University of New York. URBANDOC does not use all the features 
and devices in this particular configuration. The portions that it does use are the 
following: 

1 2,000 positions of computer memory (1 2K) ; 
a model 1402 card reader-punch; 
a model 1403 printer; 
four tape drives; 

one model 131 1 dis 1 drive with diskpacks; 
advanced programming special feature; 
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high-low-equal compare special feature; and 
sense switches. 
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Except for the Publications Module, one program in the Pre-edit Module and the 
Engineering Index programs, the system could be operated on a machine with the 
following features and devices; 

8,000 positions of computer memory (8K); 
a model 1402 card reader-punch; 
a model 1403 printer; 
four tape drives; 

advanced programming special feature; 
high-low-equal compare special feature; and 
sense switches 

(See the program inventory in each module discussion for the specific configuration 
applicable to each program.) 

As noted above, any model tape drive on the 1401 can be used w.thout performing 
systems changes provided that the tapes are created at a density of 556 characters per 
inch (cpi). If a density of 800 cpl is used, the sort control cards must be changed to 
reflect the new tape density. (See Chapter IX, Operating Instructions.) 

URBANDOC uses the 1311 disk drive to facilitate searching (see Chapter V, specifically 
the abstract for PHASE4, and Chapter IX, specifically the operating instructions for 
PHASE4) and to assemble or compile new programs under development. Searching can be 
performed without the use of a disk drive and programs can be assembled using the tape 
version of COBOL and AUTOCODER translators. 

Other special features ^uch as multiply-divide, print storage and a model 1407 inquiry 
console might exist on other systems. The multiply-divide feature is not necessary to the 
operation of the system. If present, it produces no added advantages. The same holds true 
for the inquiry console. However, the presence of the print storage feature could serve to 
reduce run time for programs with heavy printing. 

URBANDOC's system could also be run on an IBM 1460, an IBM 1410 with 1401 
compatibility or an IBM 360 with the 1401 compatibility feature. It might also be 
possible to operate the system on other manufacturers' equipment (ie., the Honeywell 
200) through the use of a translator like "'Liberator". However, URBANDOC has not 
tested any of the above procedures 1 and could not guarantee their success.) The above 
statements on machine-independence assume that other systems meet the input-output 
requirements concerning a card reader-punch, printer, and tape drives. 

Organization of the Manual 

The Operations Manual is organized accord' gto usage: Chapter II through Chapter VI for 
systems analysis and programming, Chapter VII through Chapter XIII for operations. 
Systems analysis and programming are, in turn, organized by program module. A program 
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inventory and detailed specifications of the input files and both major and auxiliary data 
files are described in the overview of each program module. Abstracts and programmers' 
notes (where applicable) are also provided for each program for a complete discussion of 
each module. 

Operations are divided into several areas. Data Entry, Processing Cycles, Operating 
Instructions and Error Listings are part of the daily operation of the system. Timing and 
Tape Library and Report Controls are part of the management of the system. 
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THESAURUS MODULE 
The Thesaurus File 

The Thesaurus File is the control file for the content analysis terms, that is terms from 
the URBANDOC Thesaurus that are assigned by the document analyst to describe a 
document's con i. {See the Genera f Manual, Chapter IV, specifically the Subject 
Thesaurus,} The jurus File is used primarily to validate the content analysis terms, to 
indicate the conec u^age of a term to the document analyst and to make the substitution 
from the natural language geographic term to code form. (See the Genera / Manual, 
Chapter IV, specifically the Geographical Thesaurus.) The Thesaurus File itself contains 
two types of terms: subject and geographic. From a data processing viewpoint, no 
distinction is made between the two types. 

CFS systems requirements state that all the information for one term be contained within 
a single tape record. URBANDOC has developed a technique for the extension of term 
information to several tape records. (This will be discussed under continuation records.) 
In all future discussions, however, a reference to a term is to one tape record. The system 
treats continuation records as "new terms". 

When considering type of information on the Thesaurus File, the reference is to the 
sections (Term, Cross-reference and Program Utility) within a single tape record. The 
computer system uses the Term Section to perform the computer validation of terms and 
the substitution of the geographic codes. The Cross-reference Section contains the data 
indicating to the document analyst the correct usage of the terms; this information is not 
processed but only printed by the computer s/stem. The Program Utility Section is 
included to facilitate computer processing during file maintenance. This information is 
used only by the programs and is not made available either to the document analyst or 
external user of the Thesaurus. 

With no distinction made between the subject and geographic information, there is only 
one type of tape record on the file. The maximum size of this tape record is 1296 
characters. The Program Utility Section and Term Section are fixed in structure and size. 
Only the Cross-reference Section is variable in length. 

The Program Utility Section is contained in every record. The first section on the tape 
record, it contains the following information about the size and nature of the term being 
entered: 

Record size: a three-position field containing the number of characters in the 
record. For records with more than 999 characters of information, the field is 
maintained in 1401 machine-language code for core storage addresses. (See Figure 
2.); 

Term size: a three-position field containing the number of characters in the actual 
term field. A term may be variable in length to a maximum of fifty-four characters; 

Cross-reference size: a three-position field containing the number of characters of 
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cross-reference data. For records with more than 999 characters of cross-reference 
information, the field is maintained in 1401 machine-language code for core storage 
addresses; 

Substitution flag: a one-position code indicating the presence of substitute 
information fora term; 

Truncation flag: a one-position flag indicating the presence of a term root; 

Entry code: a one-position code for the authorized usage of a term as a descriptor, 
subdescriptor or both. 

The Term Section is composed of the authorized term, the preferred form (or substitute) 
for geographic terms and synonyms, the date of the term's entry into the Thesaurus and 
its last date of revision. (For the formation of these fields, see Chapter VII, specifically 
Thesaurus Data Entry.) It also includes type code indicating the authorized usage of a 
descriptor. This section immediately follows the Program Utility Section in the record. 

Type code indicates whether a descriptor may be used for independent searching or if it 
must be used in combination with another term. Since subdescriptors cannot be searched 
independently, this field is blank if the term is a subdescriptor. It is always used for a 
descriptor. (For the contents of this field, see Chapter VII, specifically Thesaurus Data 
Entry.) 

The substitute entries for a term (if used) include a search substitute and a publications 
substitute. The search substitute is used to replace an input term used as a descriptor or 
subdescriptor. The publications substitute is used to replace an input term used as a 
subject heading. The original CFS system made provision for just a search substitute. 
Since this caused problems when the term was used as a subject heading, the publications 
substitute was created which eliminated all coding and enlarged the size of the substituted 
field. These fields will be used only if the substitution flag has been set. (For further 
information on the format of each substitute, see Chapter VII, specifically Thesaurus 
Data Entry.) 

The General Manual contains a detailed discussion of the formation of the descriptor, 
subdescriptor and publications substitute. Each term will be stored left -justified in the 
field and the remainder of the field right-padded with blanks if necessary. The fields are 
stored on the record exactly as entered onto the worksheet. (For further information, see 
Chapter VII, specifically Thesaurus Data Entry. See also the General Manual , Chapter VI, 
specifically the Thesaurus File Description.) 

The Cross-reference Section is not processed but only printed by the computer system. 
This is information that may be entered by the document analyst for use by himself or an 
external user. The contents and format are unstructured except that within one record 
this section cannot exceed sixteen lines of information and each line may not exceed 
seventy-two characters in length. The system does not actually count the number of 
characters included on a card until it reaches the maximum limit. Rather, it accepts the 
first sixteen cards and stores seventy- two characters of information from each. 
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The Cross-reference Section may contain such information as the relationship of a term 
to other terms in the Thesaurus or scope notes. Within each line of information, the data 
is stored exactly as entered onto the worksheet. This data is not edited other than for 
length limitations. The Cross-reference Section immediately follows the Term Section in 
the record. (For further information as to the contents of this section, refer to Chapter 
VII, specifically Thesaurus Data Entry.) 



In certain instances, there are more than sixteen cross-references for a term. In this case, a 
continuation record must be created for each additional sixteen cross-references. The 
continuation record will not be processed but only printed by the system. The basic term 
record will contain the term and the first sixteen cross-references. The continuation 
record will contain a continuation note as part of the term since more than one 
continuation record may be necessary, for example: FORMATS (CONTA) and the next 
sixteen cross-references. 
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"Basic: Part of the standard processing for updating the Thesaurus Fife. 

On-demand: Processing performed on request only. 

**Set V. 12K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; 1311 disk drive; 
advanced programming; high-low-equal compare; sense switches. 

Set 2: 8K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; advanced programming; 
high-low-equal compare; sense switches. 
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THESAURUS FILE RECORD LAYOUT 1 
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1 Reproduced from Chapter VI of the General Manual. 
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U07-U9I 


Position 


1500-1599 


V00-V99 


5500-5599 


V0 + -V9Z 


9500-9599 


V01-V9R 


13500-13599 


V07-V9I 


1600-1699 


W00-W99 


5600-5699 


W0+= -W9Z 


9600-9699 


W0I-W9R 


13600-13699 


W07-W9I 




1700-1799 


X00-X99 


5700-5799 


X0 + -X9Z 


9700-9799 


X01-X9R 


13700-13799 


X07-X9I 




1800-1899 


Y00-Y99 


5800-5899 


Y0 + -Y9Z 


9800-9899 


Y0I-Y9R 


13800-13899 


Y07-Y9I 




1900-1999 


Z00-Z99 


5900-5999 


Z0 + -Z9Z 


9900-9979 


Z01-Z9R 


13900-13999 


zo?-r9i 




2000-2099 


100-199 


6000-6099 


10 s +*!9Z 


10000-10099 


I0I-19R 


14000-14099 


107-191 




2100-2199 


J00-J99 


6100-6199 


J0 + -J9Z 


10100-10199 


J01-J9R 


14100-14199 


J07-J91 




2200-2299 


K00-K99 


6200-629 9 


K0 =+-K9Z 


10200-10299 


K01-K9R 


14200-14299 


K07-K9I 


6-Bit (11-Zone) 


2300-2399 


100.L99 


6300-6399 


L0 + -19Z 


10300-10399 


L01-L9R 


14300-14399 


107-191 


over Hundreds 


2400-2499 


M00-M99 


6400-6499 


M0 4 s -M9Z 


10400-10499 


M01-M9R 


14400-14499 


M0- -M9I 




2500-2599 


N00-N99 


6500-6599 


N0+-N9Z 


10500-10599 


N01-N9R 


14500-14599 


N07-N91 


Position 


2600-2699 


000-099 


6600-6699 


O0 + -O9Z 


10600-10699 


O01-O9R 


14600-14699 


007-091 




2700-2799 


P00-P99 


6700*6799 


P0 + -P9Z 


10700-10799 


P0I-P9R 


14700-14799 


P07-P9I 




2800-2899 


Q00-Q99 


6800-6899 


Q04-Q9Z 


10800-10899 


QGI-Q9R 


14800-14899 


Q07-Q9I 




2900-2999 


ROO-R99 


6900-6999 


R0 + -R9Z 


10900-10999 


R01-R9R 


14900-14999 


R07-R9I 




3000-3099 


700-799 


7000-7099 


70 + -79Z 


11000-1 1099 


70I-79R 


15000-15099 


707-791 




3100-3199 


A00-A99 


7100-7199 


A0 + -A9Z 


11100-11199 


A0I-A9R 


15100-15199 


A07-A9I 




3200-3299 


B00-B99 


7200-7299 


B0 + -B9Z 


11200-11299 


B0I-B9R 


15200-15299 


B07-B9I 


AB-Bits (12-Zone) 


3300-3399 


C00-C99 


7 300-7399 


C0+-C9Z 


11300-11399 


C0I-C9R 


15300-15399 


C07-C9I 


over Hundreds 


3400-3499 


D00-D99 


7400-7499 


D0 + -D9Z 


11400-11499 


0Q1-D9R 


15400-15499 


D07-D9! 


3500-3599 


E00-E99 


7500-7599 


E04-E9Z 


11500-11599 


E0I-E9R 


15500-15599 


E07-E9I 


Position 


3600-3699 


F00-F99 


7600-7699 


F0 + -F9Z 


11600-11699 


F0S-F9R 


15600-15699 


F07-F9I 




3700-3799 


G00-G99 


7700-7799 


G0 + -G9Z 


11700-11799 


G01-G9R 


15700-15799 


G07-G9I 




3800-3899 


H00-H99 


7800-7899 


H0 + -H9Z 


11800-11899 


H0I-H9R 


15800-15899 


H07-H9I 




3900-3999 


100-199 


7 900-7999 


I04-I9Z 


11900-11999 


I0J-I9R 


15900-15999 


107-191 








Units Position: 




Units Position: 




Units Position: 










Address 




Address 




Address 










Digit 


Code 


Digit 


Code 


Digit 


Code 








0 


+ 


0 


1 


0 


? 








1 


/ 


1 


J 


1 


A 








2 


S 


2 


K 


2 


B 








3 


T 


3 


L 


3 


C 








4 


U 


4 


M 


4 


D 








5 


V 


5 


N 


5 


E 








6 


w 


6 


O 


6 


F 








7 


X 


7 


P 


7 


G 








8 


Y 


8 


Q 


8 


H 








9 


z 


9 


R 


9 


1 



2 International Business Machines, Data Processing Division, IBM 1401 Data Processing System Operator's Guide , 
1 51 p. (White Plains, N.Y,, 1965), p140. 



Figure 2 



Program Abstracts 



XMAINO — Pre-list of the Thesaurus input 

Abstract 

The pre-list was designed to detect errors in the Thesaurus Input through conpputer 
checking and visual verification prior to the Thesaurus File update. The pre-list provides 
proof-reading copy along with diagnostics of computer-detected errors. 

Programmer's Notes 

Definition of a term set A term set is defined as all the units with the same accession 
number. (See Chapter VII, specifically Thesaurus Data Entry.) 

Use of accession number for changing a term . When changing a term on the Thesaurus 
File, the transaction is handled by the deletion of the existing term and the entry of the 
term in its new form. The entry of consecutive term sets with the same accession number 
cannot be properly edited by the program. Since the deletion unit and the revised term 
set have the same accession number, it is important to separate the deletion entries from 
the additions in the input file. 

Differentiation between additions and deletions. Even though the additions and deletions 
are kept separate, they can still be processed as part of the same run. Rather than treating 
them as two decks of cards, they are to be considered as two portions of one deck. 
Although the system does not specify which is to appear first, URBANDOC places the 
deletions before the additions to facilitate SORTX1 , 

XMAIN1 — Formatting the Thesaurus Input 

Abstract 

The Thesaurus Input, as first entered, is on cards. XMAIN1 edits the term sets for 
sequence, coding, etc. (See Chapter VII, specifically Thesaurus Data Entry.) The terms 
will be compacted and the term sets without errors formatted into single records. The 
terms which did not meet the editing standards will be included in the Error Listing and 
removed from further processing. 

Progra mmer 's No tes 

Differentiation between additions and deletions. When changing a term on the Thesaurus 
File, the transaction is handled by the deletion of the existing term and the entry of the 
term in its new form. Because of this, as discussed in XMAINO, it is important to keep the 
deletion entries separate from the additions in the input file. 

Use of the accession number. Accession number is used only to tie together the various 
units of a term set in the input. Once the term set has been edited and a tape record 
created, accession number is dropped. Future processing will use the term itself as the 
control field for the file. 
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Thesaurus Module 



Compaction of the term . CFS allows the document analyst to enter a term anywhere 
within the term field on the Thesaurus Worksheet. As part of the formatting procedure, 
the term will be stored at the left-most boundary of the field (left-justified within the 
field). In the process of locating and shifting the term, any leading blanks and consecutive 
embedded blanks (after the first) will be removed. The character content of the term is 
not examined here. (For further information, see General Manual , Chapter VI, 
specifically the Thesaurus File Description.) The length of the term is not affected other 
than for the removal of extra blank characters. 

SORTX1 - Sort the Thesaurus Input File 

Abstract 

Since the Thesaurus Input File will update the Thesaurus File, it must be sorted into 
alphabetical sequence by term. The Thesaurus Input is entered as two blocks of data, 
deletions and additions. Within each block, the entries appear in random alphabetical 
sequence by term. At the conclusion of the sort, the file will be in sequence by term 
(according to the computer collating sequence of the IBM 1401 as illustrated in Figure 3). 
For a term change in which there is both a deletion entry and an addition entry, the 
deletion entry will be placed before the addition entry. 

XMAIN2 — Create and/or Update the Thesaurus File 
Abstract 

The Thesaurus File will be updated with the contents of the Sorted Thesaurus Input File. 
Existing terms may be deleted. New terms may be added. The contents of an existing 
term set (term substitutes, cross-references, etc.) may be revised through the combination 
of a deletion entry and the reentry of the term set in its revised form. 

Before processing any of the above transactions, the program will check for possible 
errors within the input — such as entering a duplicate term or deleting a non-existent 
term. Any inconsistencies in the input will be flagged in the Error Listing. 

XMAIN3 — Print the Thesaurus File 
Abstract 

The Thesaurus File will be printed in the form of publishable copy or working copy. The 
printed Thesaurus will include authorized usage, substitutes, cross-references and scope 
notes for each term., The entire file, the subject section or the geographic section may be 
selected for listing. In addition, checkpoint restart is provided on an interrupt basis. Input 
for a new Permuted Thesaurus File may also be created. 

Programmer's Notes 

Publishable copy of the Thesaurus File. The nature of XMAIN3 is such that the 
publishable copy of the Thesaurus could not be produced from the basic program. To 
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produce publishable copy, there is a set of cards identified as the "Publications Patches 
to XMAIN3'\ When producing publishable copy of the Thesaurus File these cards should 
be inserted before the last card of the program deck and removed at the completion of 
the run. 

The Permuted Thesaurus File. XMAIN3 was modified by URBANDOC to create punched 
card output of the term entries to be used as the input to XMAIN5. The nature of the 
program modification is such that when creating punched card output all terms are 
punched. It is not possible to punch some of the terms. 

Unless the terms added to and deleted from the Thesaurus File have been accumulated on 
punched cards by the document analysts since the creation of the latest Permuted 
Thesaurus File, the entire file of terms must be recreated. For this option, the 
"Publication Patches to XMAIN3" must be used. 

Report date. Report date il me date in the tape label for the file. For the Thesaurus File, 
this date is the date of the last file maintenance. For the Thesaurus Supplement File, it 
will be the date used in the selection of the terms (cut-off date). 

XMAIN4 — Statistical Analysis of the Thesaurus File 



The structure of the Thesaurus will be examined for the number of terms, number of 
searchable terms, relationships of terms to each other in hierarchical and lateral 
relationships, etc. The Statistical Analysis will provide the frequency counts for both the 
subject and geographic sections of the Thesaurus File. 

XMAIN5 — Permute the Thesaurus File 



When maintaining and printing the Permuted Thesaurus, an existing file may be updated 
or a new file created. Before the actual permutation of the words in the term, the entry is 
edited. Terms with words of excessive length will be listed for the document analyst's 
attention and will not be permuted for that word. A term containing a "stop word" also 
will not be permuted for that word. A final version of the Permuted Thesaurus may be 
printed in single or multiple copies. An output tape is created for subsequent updating. 

Programmer's Notes 

Concept of a word. The Thesaurus File sets a limit of fifty -four characters on the size of a 
term on the file. This does not imply a one-word term varying in length to fifty-four 
characters. For the most part, the longer terms in the URBANDOC Thesaurus contain 
several words which together are not more than fifty-four characters. 



Abstract 



Abstract 
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Thesaurus Module 



XMAIN5 will permute a term on the words which form it. For this program, a "word" 
begins with the first character after a blank and ends with the last character before the 
next blank. As such, a word cannot exceed twenty characters. 

Stop word. A "stop word" is a user-supplied word which should not be used for 
permuting a term. These are also known as "common words" or "kill words", for 
example "an", "the", "to". 

The permuted entry . The permutation process creates one entry for each word in a term, 
excluding "too long" terms and stop words. This principle is illustrated in the following 
sample of entries from the Permuted Thesaurus: 

BICYCLING 
BIDS, BIDDING 
BIDDING, BIDS 

SIGNS, BILLBOARDS 

The entries are aligned along a center point. Each entry is positioned according to the 
word being permuted and its location within the term. The portion to the right of the 
alignment point is the "index entry" while the portion to the left is the "end of entry". 

Although a term may not be longer than fifty-four characters on the Thesaurus File, extra 
positions have been allotted to these fields on the Permuted Thesaurus File to allow for 
the proper positioning of longer terms. The terms are fully formatted for the printed page 
on the tape file. 

XMAIN6 — Create A Thesaurus Supplement File 

Abstract 

The Thesaurus Supplement was designed to reduce the number of new editions that must 
be published. The Thesaurus Supplement File will contain the new terms, subject, 
geographic, or both, that have been entered or revised after the cut-off date. A printed 
supplement can be obtained by using the Thesaurus Supplement File as the input to 
XMAIN3. 

Programmer's Notes 

Cut off date. The cut-off date is a user-supplied date which determines the selection of 
terms for the Thesaurus Supplement File. Any term will be included that has been 
entered or revised after the supplied date. 



Date of the supplement. The dates of the Thesaurus File and the Thesaurus Supplement 
File are contained in the tape labels for the files. The date of the Thesaurus File is the 
date of the last file update. For the Thesaurus Supplement File, the date will be the 
cut-off date for the run. 
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Input Specifications 

Thesaurus Input 

(See ChaDter VII, specifically Thesaurus Data Entry.) 



a. Main Term Format: 



Field 


Size 


Cols 


Contents 


Thesaurus Term 


X(54> 


1-54 




Entry Date 


X(8) 


55-62 


mm/dd/yy 


Filler 


X(2) 


63-64 




Change Date 


X(8) 


65-72 


mm/dd/yy 


Accession Number 


X(4) 


73-76 




Sequence Number 


X 


77 




Filler 


X 


78 




Truncation Code 


X 


79 




Entry Code 


X 


80 





b. Substitute Term Format: 



Field 


Size 


Cols 


Search Substitute 


X(24) 


1-24 


Publications Substitute 


X (35) 


25-59 


Filler 


X ( 1 3) 


60-72 


Accession Number 


X(4) 


73-76 


Sequence Number 


X 


77 


Filler 


X(3) 


78-80 


Cross-Reference Format: 


Field 


Size 


Cols 


Cross-reference Code 


X 


1 


Filler 


X(2) 


2-3 


Cross-reference 


X<51) 


4-54 


Filler 


X(13) 


55-72 


Accession Number 


X(4) 


73-76 


Sequence Number 


X 


77 


Filler 


X(2) 


78-79 


Card Code 


X 


80 



Scope Note Format. 






Field 


Size 


Cols 


Line 


X(54) 


1-54 


Filler 


X(1 8) 


55-72 


Accession Number 


X(4) 


73-76 


Sequence Number 


X 


77 
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Thesaurus Module 




Filler 

Card Code 



X<2) 78-79 

X 80 



Permuted Thesaurus Input 
a. Term Card Format: 

Field Size Cols 

Thesaurus Term X(53) 1*53 

Filler X(27) 54-80 

Tape File Specifications 

Thesaurus Input File 
Sorted Thesaurus Input File 

a. File Format : 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 



Header Label Format: Single 


fixed-length record 


of 40 characters 




Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


J 1 HDR' 


Filler 


X<4) 


5- 8 




Label 


X(1 6) 


9-24 


'DICTIONARY-INPUT' 


Filler 


X<6) 


25-30 




Date 


X(6) 


31-36 


mmddyy 


Filler 


X 


37 




Reel Number 


X(3) 


38-40 




Data Record Format: Unblocked variable-length records, maximum size of 1296 characters 


Field 


Size 


Cols 


Contents 


Record Size 


X(3) 


1- 3 




Term Size 


X<3) 


4- 6 




Cross-reference Size 


X{3) 


7- 9 




Substitution Flag 


X 


10 




Filler 


X 


11 




Truncation Flag 


X 


12 




Entry Code 


X 


13 




Term 


X(54) 


14* 67 




Entry Date 


X(8) 


68- 75 


mm/dd/yy 


Filler 


X(2) 


76- 77 




Change Date 


X<8) 


78- 85 


mm/dd/yy 


Search Substitute 


X(24) 


86- 109 




Publications Substitute 


X(35» 


110- 144 




Cross-references 


X<1 1 52> 


145-1 296 


Occurs as a variable number 



of X (72) fields. Maximum 
of 1 6 occurrences. 



22 
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d. Trailer Label Format: Single fixed-length record of 40 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


For a multi-reel file, only 


Filler 


X(36) 


5-40 


the last reel contains '1 EOF'; 
all other reels contain 1EOR 



Thesaurus File 
Thesaurus Supplement File 

a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format: Single fixed-length record of 40 characters 






Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


'1 HDR' 


Filler 


X<5) 


5- 9 




Label 


X(1 5) 


10-24 


'DICTIONARY FILE' 


Filler 


X(6> 


25-30 




Date 


X(6) 


31-36 


mmddyy 


Filler 


X 


37 




Reel Number 


X(3) 


38-40 




Data Record Format: Unblocked 


variable-length records, maximum size of 1296 characters 


Field 


Size 


Cols 


Contents 


Record Size 


X<3) 


1- 3 




Term Size 


X(3> 


4- 6 




Cross-reference Size 


X(3> 


7- 9 




Substitution Flag 


X 


10 




Filler 


X 


11 




Truncation Flag 


X 


12 




Entry Code 


X 


13 




Term 


X(54) 


14- 67 




Entry Date 


X(8) 


68- 75 


mm/dd/yy 


Filler 


X(2) 


76- 77 




Change Date 


X|8) 


78- 85 


mm/dd/yy 


Search Substitute 


X(24) 


86- 109 




Publications Substitute 


X(35) 


110- 144 




Cross-references 


X (1 1 52) 


145-1296 


Occurs as a variable number 



of X(72) fields. Maximum 
of 16 occurrences. 
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Thesaurus Module 



d. Trailer Label Format: Single fixed-length record of 40 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


For a multi-reel file, only 


Filler 


X(36) 


5-40 


the last reel contains '1 EOF'; 
all other reels contain '1EOR 



Permuted Thesaurus File 

a. Fite Format: 

No header label 
Data records, tape mark 
No trailer label 

Data Record Format: Fixed-length records of 80 characters 

Blocking factor of 6 
Padding record of 9s 



Field 


Size 


Cols 


Contents 


End of Entry 


X(34) 


1-34 


)See XMAIN5, 


Index Entry 


X(35) 


35-69 


programmer's Notes 


Filler 


X(11) 


70-80 





References to the URBANDOC Final Report 



Much of the information presented in this chapter is designed to be used with sections 
of the General Manual (G.M.) and other sections of the Operations Manual (O.M.) 

For additional information on the Thesaurus File, its format and the considerations in 
creating and maintaining entries on the file, see: 



Manual 


Chapter 


Section 


G.M. 


IV — Document Analysis: Content 


Thesaurus 


G.M. 


VI — Systems Modules: Input 


Thesaurus Module: 
Thesaurus File Description 


O.M. 


VII — Data Entry 


Thesaurus Data Entry 


O.M. 


X — Error Listings and Systems 
Messages 


Thesaurus Module 



For additional information on the design and goals of the Thesaurus Module, see: 



Manual 


Chapter 


Section 


G.M. 


VI — Systems Modules: Input 


Thesaurus Module 
Function, Tasks 
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For additional information on operating this portion of the system see: 



Manual 

O.M. 


Chapter 

V 1 1 1 — Processing Cycles 


Section 

Input Processing Cycle, 
Miscellaneous Thesaurus 






Products 


O.M. 

O.M. 

O.M. 


IX — Operating Instructions 

XI — Tape Library and Report Controls 

XII - Timing 


Thesaurus Module 
Thesaurus Module 
Thesaurus Module 
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Pre-edit Module 



PRE-EDIT MODULE 



Program Inventory 



Configu- 



Program 

Number Program Function Source 



Basic or Programming ration 

On-demand* Language Set** 



E0010 To format records URBANDOC Basic 



COBOL 



Either 1 or 2 



to the requirements 
of the Pre-edit and 
File Maintenance 
Modules 



E0020 To expand the URBANDOC Basic 

Document Input File, 
generate all the 
standard input units 
and list the input 



COBOL 



Either 1 or 2 



E0030 To edit the Document URBANDOC Basic 



COBOL 



1 



Input File for such 
errors as sequence, 
coding, format, etc., 
and list the detected 
errors 



original Document 

Input File with 

the contents of the 
Document Revision 
File, making 
additions, deletions 
and changes 



•Basic: Part of the standard processing of the Document I nput File. 

**Set 1: 12K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; 131 1 diskdrive; 
advanced programming; high-low-equal compare; sense switches. 

Set 2: 8K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; advanced 
programming; high -low-equal compare; sense swithces. 



E0040 To update the 



URBANDOC Basic 



COBOL 



Either 1 or 2 
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Program Abstracts 

E0010 — Card-To-Tape of the Document Input 
Abstract 

E0010 will create a tape image Document Input File in the format required by the 
Pre-edit and File Maintenance Modules. Advantage will be taken of the maximum 
blocking factor allowed by CFSo Some sequence checking will be performed. 

Programmer's Note 

Revisions to the input. This program can also process the corresponding Document 
Revision File. (See Chapter VII, specifically Document Data Entry. See also Chapter VIII, 
specifically the Editing and Validation Cycle and Input Processing Cycle). 

E0015 — Sort the Document Input File 

Abstract 

The Document Input File is created in the same sequence as the Document Input. Since 
the Document Master File will be updated with this input, the Document Input File must 
be sorted to document number and unit number sequence. 

Programmer's Notes 

Revisions to the input . This program can also process the corresponding Document 
Revision File. (See Chapter VII, specifically Document Data Entry. See also Chapter VIII, 
specifically the Editing and Validation Cycle and Input Processing Cycle.) 

Sequence of the Document input Fife. The total input file will be sorted to document 
number sequence by the use of the document number field in each record. The unit 
number field contains a sequential number which serves to keep the individual records 
within the reference in the correct sequence. (See Chapter VH, specifically Document 
Data Entry.) 

E0020 — Expand and List the Document Input File 

Abstract 

The Document input File will be expanded to the full format required by the CFS 
system. The program will generate such information as the heading units for a reference, 
the additional geographic entries for SMSA and city size, the subject headings and the 
imprint and collation information for subdocuments. The program will also expand to 
CFS format the date entries and subdescriptor entries. A formatted listing of the input or 
output may also be produced. 

Programmer's Notes 

Expansion of the input. In an attempt to maximize the rate of creating document input 
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Pre-edit Module 



while minimizing the data entry costs, URBANDOC devised a set of basic input to be 
entered, (See Chapter VI t, specifically Document Data Entry. See also the Genera f 
Manual , Chapter VI, specifically the Pre-edit Module.) E0020 was designed to expand 
this basic data to the format of the CFS system by generating additional entries and 
completing certain fields in other entries. 

Revision to the input . Because E0020 always generates the additional entries for the 
reference, this program cannot be used during the Input Processing Cycle. (See Chapter 
VIII, specifically the Input Processing Cycle.) If the program was used, some problems 
that could arise are the duplication of entries and the accidental deletion of desired 
information. 



The original purpose of E0020 was to reduce the amount of effort on the part of the 
document analyst in entering the data into the system. Since this program cannot be used 
for revisions, this data must be entered according to CFS specifications. (See Chapter VII, 
specifically Editing and Validation.) 

Reference to an index journal issue, tt is possible to insert, as a descriptor, the issue 
number of the Input Index in which the reference appears. (See Chapter IX, specifically 
the operating instructions for E0020 .) Since this particular technique has been used only 
in test situations, its actual utility is yet to be determined. It could be of help in creating 
cumulative issues for the Input Index. (See Chapter VIII, specifically Miscellaneous 
Publications Products.) 



E0030 — Edit the Document Input File 
Abstract 

The Full Document Input File will be edited for errors in sequence, coding, format, 
authorized bibliographic elements, etc. The detected errors wiil be printed with an error 
code as part of the Pre-edit Listing. 

Programmer's Notes 

Terminal punctuation for geographic terms i Originally, a terminal punctuation character 
of either or 7' was required for each geographic term for which a numeric 

substitution is generated. This check was accordingly built into the system. When terms 
were added that have no numeric substitution, no punctuation character was specified as 
the last character of the term, e.g. "Miami Valley Ohio Region". However, the program 
continued to check for the terminal punctuation character for these terms. When terminal 
punctuation is missing, an error message will occur even though the term may be correct. 
For these cases, ignore the message. 

More than nine errors. The system based its edit checking on the assumption that no 
more than nine errors would occur within any one unit. If a unit should contain more 
than nine errors, the additional ones will not be detected. 



Valid bibliographic elements . EQQ3Q is one of the few programs that checks element 
number by a pre programmed list rather than through a lead card, that is a program 
control card providing data required for processing. If the authorized bibliographic 
element numbers are changed, the program must be modified and recompiled or incorrect 
messages will result. 

E0040 — Revise the Document Input File 
Abstract 

The Full Document Input File will be revised by the Document Revision File. Entire 
references or units of a reference (one or more of the logical records) may be added or 
deleted. Individual units may be replaced. A listing of the processed transactions will be 
printed with codes for the action taken. Non processed units will also be listed. 

Input Specifications 



Document Input 

(See Chapter VII, specifically the Document Worksheet) 
a. Descriptor Format: 



Field 


Size 


Cols 


Document Number 


X(14) 


1-14 


Unit Number 


X(4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Expand Code 


X<2) 


21-22 


Pre-code 


X 


23 


Descriptor 


X(35) 


24-58 


First Subdescriptor 


X(12) 


59-70 


Filler 


X(10> 


71-80 


Subdescriptor Format: 


Field 


Size 


Cols 


Document Number 


X(14) 


1-14 


Unit Number 


X<4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Filler 


X(14) 


2 1-34 


Additional Subdescriptor 


X(12) 


35-46 


Filler 


X(34) 


47-80 


Dates Format: 


Field 


Size 


Cols 


Document Number 


X(14) 


1-14 



Contents 




22 



23 
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Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Filler 


X<2) 


21-22 




Pre-code 


X 


23 




Descriptor 


X<5) 


24-28 


'#D ATES' 


Filler 


X 


29 




Publications Date 


X(4) 


30-33 


yyyy 


Filler 


X 


34 




Entry Date 


X{6) 


35-40 


yyyymm 


Filler 


X 


41 




Content Date 


X(4) 


42-45 


yyyy 


Filler 


X 


46 




Content Date 


X(4) 


47-50 


yyyy, used only for ranges 


Filler 


X{30) 


51-80 




d. Bibliographic Data Format: 


Field 


Size 


Cols 




Document Number 


X(14) 


1-14 




Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Element Number 


X(2) 


21-22 




Bibliographic Line 


X(58) 


23-80 




Document Revisions 


(See Chapter VII, specifically Document Master File Revisions) 




a. Header Format: 


Field 


Size 


Cols 




Document Number 


X(14) 


1-14 




Unit Number 


X(4) 


15*18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Filler 


X(2) 


21-22 




Data Entry Date 


X(24) 


23-46 




Filler 


^(34) 


47-80 




b. Descriptor Format: 

Field 


Size 


Cols 


Contents 


Document Number 


X(14) 


1-14 




Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Filler 


X<2) 


21 -22 




Pre-code 


X 


23 







23 



\ 



I 



Descriptor 


X{35) 


24-58 


First Subdescriptor 


X(12) 


59-70 


Numerical Value 


XtIO) 


71-80 


Subdescriptor Format: 


Field 


Size 


Cols 


Document Number 


X(14) 


1-14 


Unit Number 


X<4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Descriptor Tie 


X(2| 


21-22 


Subdescriptor Number 


X(2) 


23-24 


Descriptor Root 


X(6) 


25-30 


Filler 


X(4) 


31-34 


Subdescriptor 


X(12) 


35-46 


Numerical Value 


WO) 


47-56 


Filler 


X(2) 


57-58 


Subdescriptor 


XII 2) 


59-70 


Numerical Value 


X(10) 


71-80 


Bibliographic Data Format : 


Field 


Size 


Cols 


Document Number 


X(14) 


1-14 


Unit Number 


X(4> 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Element Number 


X(2) 


21-22 


Bibliographic Line 


X(58) 


23-80 



e . Subject Hea ding Forma t: 



Field 

Document Number 
Unit Number 
Entry Type 
Entry Code 
Element Number 
Subject Code 
Subject Sequence 
Subject Heading 
Filler 



Size 


Cols 


X(14) 


1-14 


X(4) 


15-18 


X 


19 


X 


20 


X<2) 


21-22 


X 


23 


X 


24 


X(35) 


25-59 


X<21) 


60-80 



Used only for publications date 



Contents 



Used only for content dates or 
entry date 

Used only for content dates or 
entry date 

Used only for content dates 
Used only for content dates 
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f. Delete Reference Format: 



Field 


Si 2 e 


Cols 


Document Number 


X(14) 


1-14 


Filler 


X(4> 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Filler 


X<60> 


21-80 



Tape File Specifications 

Document Input File 
Sorted Document Input File 

a. File Format: 

No header label 
Data records, tape mark 
No trailer label 

b. Data Records Format: 

Fixed-length records of 84 characters 
Blocking factor of 10 
Padding record of 9s 

Five data records: 

Descriptor 

Subdescriptor 

Dates 

Bibliographic Data 
Delete Reference 

c. Descriptor Format: 



Field 


Size 


Cols 


Document Number 


X(14) 


1-14 


Unit Number 


X(4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Expand Coda 


X(2> 


21-22 


Pte-code 


X 


23 


Descriptor 


X(35) 


24-58 


First Subdescriptor 


X(12) 


59-70 


Filler 


X(13) 


71-83 


Record Mark 


X 


84 



25 



32 



d. Subdescriptor Format: 



Field 


Size 


Cots 




Document Number 


X(14) 


1-14 




Unit Number 


X<4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Filler 


X(14) 


21-34 




Additional Subdescriptor 


X(12) 


35-46 




Filler 


X(37) 


46-83 




Record Mark 


X 


84 




Dates Format : 


Field 


Size 


Cols 


Contents 


Document Number 


X (1 4) 


1-14 




Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Filler 


X{2) 


21 22 




Pre-code 


X 


23 




Descriptor 


X(6) 


24-29 




Publications Date 


X(4) 


30-33 


vvvv 


Filter 


X 


34 




Entry Date 


X(6) 


35-40 


yyyymm 


Filler 


X 


41 




Content Date 


X(4) 


42-45 


VVVV 


Filler 


X 


46 




Content Date 


X(4) 


47-50 


yyyy, used only for ranges 


Filler 


X|33) 


51-83 




Record Mark 


X 


84 




Bibliographic Data Format : 


Field 


Size 


Cols 




Document Number 


X(1 4) 


1-14 




Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Element Number 


X(2) 


21-22 




Bibliographic Line 


X<58) 


23-80 




Filler 


X(3) 


81-83 




Record Mark 


X 


84 




Delete Reference Format : 


Field 


Size 


Cols 




Document Number 


X|14) 


1-14 






rt ^ 

nW. 
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\ 



t 

\ 

\ 

f 



\ 

i 

i 

\ 

ii 

i 



) 

i 

\ 




Filler X(4) 

Entry Code X 

Entry Type X 

Filler X(63) 

Record Mark X 



Full Document Input File 
Document Revision File 
Sorted Document Revision File 

a. Fife Format: 

No header labels 
Date, records, tape mark 
No trailer labels 

b. Data Records Format: 

Fixed-length records of 84 characters 
Blocking factor of 10 
Padding record of 9s 

Five data records: 

Header 
Descriptor 
Subdescriptor 
Bibliographic Data 
Subject Heading 

c. Header Format: 



Field Size 

Document Number X(14) 

Unit Number X(4) 

Entry Code X 

Entry Type X 

Filler X(2) 

Data Entry Date X(24) 

Filler X(37) 

Record Mark X 

d . Descri ptor Format: 

Field Size 

Document Number X(14) 

Unit Number X(4) 

Entry Code X 

Entry Type X 

Filler x (2) 

Pre-code X 

Descriptor X(35) 



15-18 

19 

20 

21-83 

84 



Cols 

1-14 

15-18 

19 

20 

21-22 

23-46 

47-83 

84 



Cots 

1-14 

15-18 

19 

20 

21 -22 
23 

24-58 



Contents 



* 



First Subdescriptor 
Numerical Value 
Filler 

Record Mark 

e . Subdescrip tor Forma t: 

Field 

Oocument Number 
Unit Number 
Entry Code 
Entry Type 
Descriptor Tie 
Subdescriptor Number 
Descriptor Root 
Filler 

Subdescriptor 
Numerical Value 
Filler 

Subdescriptor 
Numerical Value 
Filler 

Record Mark 

f. Bibliographic Data Format : 

Field 

Document Number 
Unit Number 
Entry Code 
Entry Type 
Element Number 

| Bibliographic Line 

j Filler 

Record Mark 

i 

| g. Subject Heading Format: 

j 

Field 

| Document Number 

| Unit Number 

j Entry Type 

l Entry Code 

] Element Number 

t Subject Code 

ft Subject Sequence 

t 

l 

iy 

r o 

jERJC 



X(1 2) 


59*70 




X(10) 


71*80 


Used only for publications date 


X(3) 


81-83 




X 


84 





Size 


Cols 


Contents 


X(14) 


1-14 




X(4) 


15-18 




X 


19 




X 


20 




X(2) 


21-22 




X(2) 


23-24 




X(6) 


25-30 




X<4) 


31-34 




X(1 2) 


35-46 


Used only for content dates 
or entry date 


X(10) 


47 56 


Used only for content dates 
or entry date 


X(2) 


57-58 




X(1 2) 


59-70 


Used only for content dates 


X(10) 


71-80 


Used only for content dates 


X(3| 


81 -83 




X 


84 





Size 


Cols 


X(1 4) 


1-14 


X(4) 


15-18 


X 


19 


X 


20 


X<2) 


21-22 


X(58) 


23-80 


X(3> 


81-83 


X 


84 



Size 


Cols 


X (1 4) 


1-14 


X(4) 


15-18 


X 


19 


X 


20 


X(2) 


21-22 


X 


23 


X 


24 



2Zl 



28 






Subject Heading 
Filler 

Record Mark 



X(35) 

X{24) 

X 



Pre-edit Module 



25-59 
60*83 
84 

References to the URBANDOC Final Report 

Much of the information presented in this chapter is designed to be used with sections 
of the Genera! Manual (G.M.) and other sections of the Operations Manual (O.M.) 

For additional information on the Document Input File, its format and the considerations 
in creating and maintaining the entries on the file, see; 



Manual 


Chapter 


Section 


G.M. 


1 — Introduction 


The Bibliographic Records 


G.M. 


II - Document Identification 


URBANDOC Document Numbers 
General 


G.M. 


HI — Document Analysis; 
Descriptive 


General Consider tions 


G.M. 


IV — Document Analysis: 


General Considerations 



Content 



For more information on the design and goals of the Pre-edit Module, see: 

Manual Chapter Section 

G,M, VI — Systems Modules: Input Pre-edit Module: Function, 

Tasks 

For more information on operating this portion of the system, s~e: 



i 



i 

i 



Manual 


Chapter 


Section 


O.M. 


VIII — Processing Cycles 


Editing and Validation 
Cycle, 

Input Processing Cycle 


O.M. 


IX — Operating Instructions 


Pre-edit Module 


O.M. 


XI — Tape Library and Report 
Controls 


Pre-edit Module 


O.M. 


XII — Timing 


Pre-edit Module 
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The Document Master File 

The Document Master File is composed of references to individual documents. Each 
reference is entered as a series of individual tape records, each record containing a 
specific, integral unit of information. The individual tape record will be referred to as a 
"record". "Reference" will apply to the aggregate of all tape records for a document. 

The primary uses of the Document Master File are in the identification, isolation and 
presentation of references. Some of the entered information describes and defines the 
contents; this portion, subject to interrogation in a search, is called the Searchable 
Information. The remaining data is useful in the presentation of the document reference 
and during a manual bibliographic search. It is not used during a computer search. This 
information is called Free Text Information. In addition to these two types, the system 
also uses a third, Program Utility Information, to facilitate computer processing during 
file maintenance and search. Program Utility Information is internal to the programming 
system and is not made available to the user. 

The Searchable Information or the Searchable Record is the first tape record of the 
reference. The Free Text Information is stored as a variable number of Free Text Records 
with a maximum of ninety-nine records. The Searchable Record is composed of the 
Document Master Identification, Program Utility Data and Searchable Data. The Free 
Text Record is composed of the Document Master Identification and Free Text Data. 
(See Figure 3.) 

The Document Master Identification contains the document identification number, 
segment number, and date of data entry. The document number and date of data entry is 
the same for all records of a reference. The segment number is a three position code 
identifying the contents of the record. For a Searchable Record, the code is '000' e For 
Free Text Records, the code is that of the bibliographic element entered in the record. 
(These codes are listed later in the section.) 

The Program Utility Data in the Searchable Record is for purposes of program efficiency 
during search. This section immediately follows the Document Master Identification in 
the record. The Program Utility Data is a table of pointers, with one entry for each 
descriptor assigned to the reference. Each pointer is eleven characters giving information 
about the location and nature of the descriptors (See Figure 3.): 

Address: a three-position field containing the displacement (from the first position 
of the individual tape record) of the descriptor within the Searchable Record. For 
those descriptors with a displacement greater than 999, the address is represented in 
the 1401 machine-language code for core storage addresses (See Figure 2.); 

Rank: a three-position field indicating the descriptor's sequence in the reference. 
For example, if the descriptor were the first assigned, rank would be '00T; 

Size: a one-position code for the length of a descriptor. Each descriptor is stored as 
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either a fixedlength field of twelve characters (size 1) or twenty-four Characters 
(size 2) (See Chapter VII, specifically Document Data Entry.); 

The number of modifiers assigned to each descriptor. If a descriptor has no 
modifiers, this field is zero; 

Type code: the descriptor's status, for precise, '#* for common and for 
internal. 

While the descriptors are stored in the order in which they were assigned, the pointers are 
arranged alphabetically by descriptor to increase the efficiency of the search process. The 
end of the Program Utility Section is designated by a dummy pointer of three characters 
coded 'END'. 

The descriptors and modifiers immediately follow the 'END' constant of the pointer 
table. The number of descriptors assigned to a record cannot exceed ninety-nine. The 
number of modifiers assigned to one descriptor cannot exceed ninety-nine. However, the 
overriding limit on the Searchable Record is that the combined length of all the sections 
of the record cannot exceed one tape record of 2200 characters. (See Figure 3.) 

The descriptor is the basic element of the search. All descriptors within a single record are 
considered to be independent and of parallel significance. The modifier, when assigned, 
represents an elaboration or refinement of a descriptor. It is in a dependent relationship 
to the descriptor to which it is assigned. Modifiers relating to the same descriptor are 
considered to be logically independent of each other. 

A descriptor may be variable in length to a maximum of fifty-four characters when stored 
on the Thesaurus File. (See Chapter VII, specifically Thesaurus Data Entry. See also 
Genera / Manual, Chapter VI, specifically the Thesaurus File Description. When stored on 
the Document Master File, each descriptor is a fixed-length field of either eleven or 
twenty-three characters. Smaller entries will be padded with blanks to conform to the 
eleven or twenty-three character length specification. The fixed-length field was adopted 
for programming and tape storage efficiency. (See Chapter VII, specifically Document 
Data Entry.) 4 

The modifier is used to make more specific use of a descriptor. This may be done either 
through a natural language subdescriptor or a value (numerical subdescriptor). Either or 
both portions of the modifier may be used. Regardless of the portion completed, the 
modifier is stored as a fixed-length field of twenty-one characters. 

The natural language subdescriptor is a secondary-level descriptor. (See Chapter VII, 
specifically Thesaurus Data Entry. See also General Manual , Chapter VI, specifically the 
Thesaurus File Description. In the Document Master File it is stored as a fixed-length field 
of twelve characters within the modifier. A pre code is not used with this field. Like the 
descriptor, longer terms will be truncated. Shorter terms will be padded with blanks to 
twelve characters, (See Chapter VII, specifically Document Data Entry.) 

The numerical value, in the CFS system, is a variation of the scientific notation for 
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decimal numbers. Each number or value is represented through eight characters of 
numerical data. Each value is composed of a six-position fraction and a two-position 
exponent representing a power of 10. For example: the number 1970 would consist of 
the fraction .197000 and an exponent of 04. That is. the number 1970 could be obtained 
by multiplying the fraction .197000 by 10,000 (10 to the fourth power). CFSalso makes 
provision for a sign character, indicating whether the number has a positive or a negative 
value. For URBANDOC's purposes, numbers are always positive. 

Numerical values may range from an upper to a lower limit. To indicate whether a value is 
a single value or the upper or lower limit of a range, a letter code is included as part of 
each modifier. All modifiers must immediately follow the parent descriptor in the record. 
(See Chapter VII, specifically Document Data Entry.) 

All modifiers relate directly to the descriptor. If the subdescriptor portion of a modifier is 
present, the numerical value applies directly to the subdescriptor. If the subdescriptor is 
not used, the value applies to the descriptor itself. URBANDOC uses a numerical value 
only with date analysis descriptors. However, if used eleswhere, the same rules apply. 

The Free Text Section contains the bibliographic information about a document 
reference. The organization of this information is by element, for example, author, title, 
imprint and collation, subject headings, etc. Within each element of bibliographic 
information, the data is neither reformatted nor edited (except for valid element 
number). 

The Free Text Information is divided into tape records (or segments) with one record 
corresponding to one bibliographic element. The CFS system provided for ninety-nine 
elements; except for one record, it placed no restrictions on the use of these records by 
the document analyst and/or systems analyst. The one identified segment was '99' to be 
used for subject headings. The purpose of segmentation is to permit the selection and 
printing of relevant portions of bibliographic information, thus avoiding the unnecessary 
recall of the entire reference. (See Figure 3.) 

URBANDOC did not identify ninety-nine bibliographic elements which it wished to 
include. It identified o total of twenty-five bibliographic elements which could be 
entered: 

01 personal author 

02 corporate author 

03 anonymous 

04 acronyms 

05 joint personal author 

06 joint corporate author 

07 consultant 

08 miscellaneous corporate name 

09 miscellaneous local place name 

10 corporate author name cross-reference 

11 distinctive title 

13 distinctive series title 
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15 non-distinctive title 

16 French title, if available 

17 German title, if available 

18 Spanish title, if available 

21 imprint 

22 imprint for a subdocument 

23 abstract and notation of content 
25 UPAP project number 

31 statutory citation 

36 literature citation 

56 acquisition information 
97 geographic index name 

99 subject heading 

The element numbers used by URBANDQC are not consecutive. Gaps exist to add new 
elements in a particular sequence or because previously used elements were discontinued. 
The sequence of bibliographic elements is significant since this information is always 
retrieved from the Document Master File in element number sequence. 



The size of the total Free Text Information is a function of the number of elements 
assigned and the maximum size for each element's entry. There may be a maximum of 
ninety-nine elements of bibliographic information. Each elementiincluding the Document 
Master Identification} may be no more than one tape record (2200 characters). Therefore, 
the total Free Text Information may be no more than 99 x 2200 characters (217,800 
characters). 

The bibliographic information begins immediately after the Document Master 
Identification. This corresponds to the first character in the pointer table of the 
Searchable Record. Each line of bibliographic information entered is given a terminal 
punctuation character of a record mark (0-2-8) by the CFS system. This character must 
be included when calculating the size of a bibliographic element. 



The Inverted File 

The Inverted File is erles of records for each descriptor used by the system. Each 
individual record dealing with the usage of one descriptor is composed of two sections: 
the Identification Section and the Document Number List. 

The Identification Section consists of a series of fields which serve to define the way the 
term has been used: 

Type: precise, common or internal descriptor; 

Frequency count: number of documents in which the term has been used as a 
descriptor; 

Date of last addition to the file as a descriptor; 
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DOCUMENT MASTER FILE RECORD LAYOUT' 



ITEM 


ITEM 


ITEM 


ITEM 


ITEM 


ITEM / 


/ ITEM 


ITEM 












k 


\ 





LOGICAL RECORQ 



I TAPE RECORD 



H 



I TAPE RECORO J t I TAPE RECORO 




I 

FIELO ADQR ESS-POS I T ION OF FIELD 

RELATIVE TO 1ST CHAR 
OF TAPE RECORD 

RANK-SEQUENTIAL POSITION OF FIELO 
AS ACTUALLY STOREO. 

SIZE-1-12 CHAR 
2-24 CHAR 

NO OF QUALIFIERS-NO. OF FIXEO LENGTH 
FIELDS ASSOCIATED 
WITH DESCRIPTOR 



TYPE-* - PRECISE, - » TEMPORARY PRECISE 
H *• COMMON 

1 - INFORMATIVE FIELO 



MAXIMUM TAPE RECORO LENGTH: 2200 CHAR 



1 International Business Machines, Data Processing Division, 1401 Information Storage and Retrieval 
System. (The Combined File Search System) by Donald Prentice, Gary de Graw, Alice Smith and 
|. Albert Warheit, Rev. Ill, 1 v. (White Plains, N.Y., 1966), p.1.08. 



Figure 3 
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Date of last deletion from the file as a descriptor; and 

Highest document number in the file (used only by the system when searching). 

The Document Number List consists of a variable number of fourteen-character fields 
containing the document numbers in which the term has been used as a descriptor. The 
maximum number of entries within one record is one hundred and fifty. If more than this 
number of postings occur, there will be a continuing record on the Inverted File for this 
descriptor. 



INVERTED FILE RECORD LAYOUT 2 



DESCR IPTOR 
A 


DESCRIPTOR 

a 


DESCR 1 j J DESCRIPTOR 

i ‘(Y ! 






IN descriptor order 



INDIVIDUAL 

RECORD 


DESCRIPTOR 
NAME 4 TYPE - 
(24 CHARACTERS) 






* 


NUMBER OF ITEMS 
POSTED - (5 CHAR. ) 




DESCRIPTOR 




DATE OF LAST ITEM # 
ADDITION - (6 CHAR.) 




** AECORO 

IDENTIFICATION 




DATE OF LAST ITEM # 
DELETION - (6 CHAR.) 








HIGHEST ITEM H IN 
LIST - (14 CHAR. ) 








FLAG - * (1 CHAR.) 








ITEM NUMBER 

NO 1 - (14 CHAR. ) 




IN 

ITEM 




ITEM NUMBER 
NO. 2 




NUMBER 

ORDER 




ITEM NUMBER 


ITEM LIST PRESENT 
FOR PRECISE DESCRIPTORS 



ONLY 




2 Ibid., p 1.12. 



Figure 4 
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Program Inventory 



Program 

Number 


Program Function 


Source 


Basic or 
On-demand* 


Propamming Configuration 
Language Set** 


MAINT1 


To format the input 
for CFS specifications 
and perform the final 
sequence check 


CFS 


Basic 


Tape Either 1 or 2 

AUTOCODER 


MAINT2 


To validate the 
descriptors, sub- 
descriptors and 
subject headings 
headings against 
the Thesaurus File 
for such factors as 
authorized terms, 
usage of terms, etc. 


CFS 


Basic 


Tape Either 1 or 2 

AUTOCODER 


MAINT3 


To print the 
CFS Edit Listing 


CFS 


Basic 


Tape Either 1 or 2 

AUTOCODER 


MAINT4 


To update the 
Document Master File 


CFS 


Basic 


Tape Either 1 or 2 

AUTOCODER 


MAINT5 


To provide tape 
labelling instructions 
for output of 
MAINT4 and set-up 
instructions for 
MAINT6 


CFS 


On-demand 


Tape Either t or 2 

AUTOCODER 


MAINT6 


To print Master 
File Activities 
and create the 
Descriptor Input File 


CFS 


Basic 


Tape Either 1 or 2 

AUTOCODER 


MAINT7 


To copy the 
Document Master File 
or recreate the 
Descriptor Input File 








MAINT8 


To generate and 
print statistics for the 
contents of the 
Document Master File 


URBANDOC 


Basic 


Tape Either 1 or 2 

AUTOCODER 
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Program 






Basic or 


Programming 


Configuration 


Number 


Program Function 


Source 


On-demand* 


Language 


Set** 


MAINT9 


To generate the 
Document Master 
Subset File 


URBANDOC On-demand Disk 

AUTOCODER 


Either 1 or 2 


LIBPRT 


To print a listing 
of the Do ument 
Master File 


CFS 


On -demand 


Tape AUTOCODER 


Either 1 or 2 


DMAIN1 


To create the 
Inverted File and 
check for certain 
types of errors 


CFS 


Basic 


Tape AUTOCODER 


Either 1 or 2 


DMAIN2 


To produce the 
Summary Listing of 
the Inverted File 
for an overview of 
each term including 
frequency, type, etc. 


CFS 


Basic 


Tape AUTOCODER 


Either 1 or 2 


DMAIN3 


To produce the 
Detail Listing of the 
usage of terms on 
the Inverted File 
with postings of 
the document numbers 
in which the terms 
have been used 


CFS 


Basic 


Tape AUTOCODER 


Either 1 or 2 



* Basic; Part of the standard processing when updating the Document Master File. 

On-demand: Performed only upon request. 

**Set 1: 12K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; 1311 disc drive; 
advanced programming; high-low-equal compare; sense switches. 

Set 2: 8K memory; 1402 card reader-punch; 1403 printer; 4 tape drives; advacnced 
programming; high-low-equal compare; sense swithces. 
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Program Abstracts 

MAINT1 — Format the Document Input File 
Abstract 

The first step in the file maintenance procedures is to format the Document Input File 
according to CFS specifications. This includes the final verification of the sequence of the 
input by document number and unit number. Out-of-sequence units are listed and 
removed from further processing. Among the valid input, document numbers, descriptors, 
subdescriptors and subject headings are reduced to their most compact form. Two output 
files are created — the formated valid input (Docu-to-Tape Fite) and all terms to be 
validated against the Thesaurus File (Look-Up Input File). 

Programmer's Notes 

Compaction of descriptors , subdescriptors and subject headings. MAINT1 scans the input 
terms for blanks within the field. Any leading blanks or embedded blanks (except for the 
word separator) in the term will be removed. The intellectual combination of descriptors 
and subdescriptors will not be affected by the compaction process since it operates 
indepondently on each entry. Multiple words within terms are allowed as long as the 
length limitation is not exceeded. Terms will be compacted to the extent that only one 
blank space is left between words. The compacted term will be stored left-justified in the 
field to agree with the terms as stored on the Thesaurus File. (See Chapter II, specifically 
the Thesaurus File and XMAIN1.) 

Compaction of document numbers . MAINT1 will scan each document number and 
remove any leading or embedded blanks in the field. After the removal of the leading or 
embedded blanks, the document number will be shifted left in the field. The removal of 
these blanks can create serious problems for both processing and file organization. First, 
the embedded blanks have been included to maintain consistency in document number 
form within a series. (See General Manual , Chapter II, specifically URBANDOC 
Document Numbers: By Series.) For example, consider the document number 'DRAA 
LP66 391 \ As a result of MAINT1, it would appear 'DRAALP66391 ' on the 
Edit File and 'DRAA L.P66 391 ’ on the Docu-to-Tape File and the two tapes could not be 
match-purged of errors. (See abstracts of SORT1, MAINT2, S0RTM2, MAINT3.) 

URBANDOC ha? avoided this problem by using a filler character of V in place of the 
blank to prevent the document numbers from being changed. No filler character is 
necessary to complete the end of the field, only to reserve places within the field. 

In the event that this error condition did occur, the processing cycle would have to be 
cancelled, the document number in error corrected and processing restarted from the 
beginning. 

Deleting descriptors , subdescriptors and subject headings. When revising the Document 
Master File, it may be necessary to delete descriptors, subdescriptors and subject headings 
from references already on the Document Master File. MAINT1 would include these 
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terms on the Look-Up Input File for validation against the Thesaurus File. If these terms 
have been removed from the Thesaurus File the deletion entries must be processed as a 
separate revision cycle in which term validation is bypassed. (See abstract and 
programmer's notes on MAINT3.) 

SORTM1 - Sort the Look-Up Input File 

Abstract 

Since the Look-Up Input File (descriptors, subdescriptors and subject headings) will be 
validated against the Thesaurus Fife, it must be sorted to the same sequence. The 
Look*Up Input File is created in document number sequence. At the conclusion of the 
sort, the file will be in term sequence. 

MAINT2 — Validate the Look-Up Input File 

Abstract 

The terms on the Look-Up Input File are validated against the Thesaurus File for four 
conditions. (1) Is the term an authorized one? (2) Has the term been used correctly as a 
descriptor or subdescriptor? (3) Is the coding of the term correct? (4) Is there a preferred 
form of the term other than the one used in the input? 

A term failing any of the first three tests is considered an invalid term. A term qualifying 
for the fourth condition is considered to have a substitu + e. It is here that the change from 
place name to geographic code is made. All other terms are valid for use in their input 
form. Only the invalid terms and the substitutes are included as part of the Bad Look-Up 
File. Along with each term on the Bad Look-Up File is a code indicating the reason for its 
inclusion, either a preferred term or the specific error. 

Programmer's Note 

The publications substitute. The CFS system provides two substitutes for each thesaurus 
term: a search substitute for content analysis and a publications substitute for subject 
headings. URBANDOC conventions require that the analyst enter the preferred form 
during document analysis. As a result, only one substitute was entered into the Thesaurus 
File. Care must be taken to use the preferred form for both descriptors and subject 
headings during document analysis. If the original term is used for a subject heading, it 
will be replaced by blanks during the validation process since there are no subject heading 
substitutes on the Thesaurus File. 

SORTM2 — Sort the Bad Look-Up File 
Abstract 

The Docu-to-Tape File will be revised according to the contents of the Bad Look-Up File. 
Since the Bad Look-Up File is created in term sequence, it must be sorted to document 
number and unit number sequence to agree with the Docu-to-Tape File. This file will 
update the contents of the Docu-to-Tape File as part of MAINT3 processing. 
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Abstract 

MA1NT3 is the major edit step in the CFS system; it updates the input with the results of 
the Thesaurus File validation and edits the data according to CFS specifications for code 
verification, sequence of bibliographic elements, and combinations of input units, 

All invalid units are removed from further processing and printed as part of the Edit 
Listing. Substitutes remain in the input but are listed in the Edit Listing for audit trail 
purposes. Part of the Edit Listing is an error code to facilitate correction of the units by 
the document analysts. (See Chapter X, specifically MAINT3.) If the unit is acceptable 
because it has not failed any edit check, the unit is reformatted (for sake of program 
efficiency) for the actual master file update. 

Programmer's Notes 

Disposition of Error Codes 30-32 . This comment applies only to the processing of 
revisions to the Document Master File. If any errors are detected within the value portion 
of the modifiers, the descriptor and its modifiers will not be processed. Although the 
error message seems to indicate that only the modifiers were non-processed, both the 
descriptor and the modifiers must be reentered. For URBANDOC, this problem only 
occurs for descriptors used in date analysis; the value portion is not used with other 
descriptors. 

Spacing of the Edit Listing. When processing new input, the messages are grouped 
together by document number. This is not the case for revisions. In this instance, the 
spacing between documents is erratic; sometimes error messages for several documents 
will be grouped together. 

Bypassing term validation. MAINT3 offers the user the ability to enter all information 
directly onto the Document Master File and Inverted File without validating descriptors, 
subdescriptors and subject heading, or without making computer substitutions of one 
form for another (as with the geographic descriptors). The user may start file 
.maintenance processing directly with MAINT3, bypassing MAINT1, SORTM1, MAINT2 
and SORTM2. (See Chapter IX, specifically the operating instructions for MAINT3.) 
URBANDOC did not use the direct to-master file option as part of its processing. 
However, this procedure might be necessary to remove descriptors, subdescriptors and 
subject headings from the Document Master File and Inverted File once the authorized 
entry for the term has been removed from the Thesaurus File. 



MAINT4 — Update the Document Master File 

Abstract 

The Document Master File will be updated with the contents of the Edit File (the valid 
and formatted input). Transactions may include the addition or deletion of entire 




All 
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document references, addition or deletion of free text segments (bibliographic elements) 
or changes to the searchable segment (content analysis) or free text segment '99' (subject 
headings). Certain systems conditions such as exceeding the record size or duplication of 
the document number would prevent a transaction from being processed. If any such 
condition is detected, the non-processed record is given a reason code to be included in an 
error listing. (See Chapter X, specifically the section on MAINT6.) Any descriptors are 
flagged for the update of the Inverted File. 

Programmer's Notes 

Inserting descriptors by rank. The CFS documentation states that when revising a 
reference, descriptors may be added either at the end of the Searchable Record or 
inserted into place by rank within the record. URBANDOC discovered a programming 
error when inserting descriptors by rank. The transaction is processed without any error 
messages. However, an incorrect pointer and meaningless descriptor is stored on the 
Document Master File and Inverted File. New descriptors should be added to the end of 
the searchable record; no rank should be specified. (See Chapter VII, specifically the 
section on Document Master File Revisions.) It was not reasonably possible to correct 
this complex CFS error. 



Adding references with duplicate document numbers . The CFS documentation states that 
duplicate document numbers will not be processed. Actually, when a duplicate document 
number is encountered, the descriptive analysis data and the non-repetitive descriptors 
and subdescriptors will be added to the existing reference. That reference must be 
removed from the Document Master File and both references reentered with unique 
identification numbers. 



MAINT5 - Print On-Line Operator Messages 
Abstract 

MAINT5 is an intermediary program which performs no actual processing of the input 
data but which serves as a control program. It provides the operator with instructions for 
labelling the tape output of MAINT4 and for setting up MAINT6. The same information 
is provided in the operating instructions. (See Chapter IX, specifically the operating 
instructions for MAINT4, MAINT5, MAINT6.) 

MAINT6 — Print Update of the Document Master File 
Abstract 

MAINT6 will print the update of the Document Master File and create the input to the 
Inverted File. The transactions to the Document Master File may be printed in their 
entirety/ by class, or not at all. if printed by class, the subsets of all additions, all 
deletions, and/or non-processed changes may be selected. Each subset is mutually 
exclusive, but multiple subsets may be printed. For the non-processed changes, the error 



code is also printed. (See Chapter Xl ( specifically the File Maintenance Module.) The 
transactions involving descriptors are separated from the rest of the bibliographic input as 
the Descriptor Input File. This file is used to update the I nverted File. 

Programmer's Notes 

Listing non-processed changes. On certain occasions, a processing error will occur when 
the operator tries to list the non-processed changes to the Document Master File (Sense 
Switches A and D). The first attempt to remedy the error should be to restart the report 
through Check-Reset, Start-Reset and Start, If this does not correct the problem, restart 
MAINT6 bypassing listing the transactions and proceed directly to the non-processed 
changes. (See Chapter IX, specifically the operating instructions for MAINT6.) 

Differentiation between descriptors and subdescriptors. The descriptors being added to or 
deleted from the references on the Document Master File are further processed for 
inclusion on the Inverted File where the user can obtain such information about the 
descriptor as frequency and place of usage. These results occur only for descriptors, No 
further processing occurs for subdescriptors. Once added to the Document Master File, 
subdescriptors can only be accessed through a computer search in conjunction with the 
descriptor which they modify. 

MAINT7 — Process the Document Master File 

Abstract 

MAINT7 is not one of the programs that is used during normal file maintenance but 
provides additional support for the Document Master File and Inverted File. All 
processing handled by MAINT7 is based on either the reproduction of the Document 
Master Fite or the recreation of the Descriptor Input File for a new Inverted File. As 
independent actions, either the Document Master File may be copied or a new Descriptor 
Input File created. By combining these capabilities, two Document Master Files may be 
merged and a combined Descriptor Input File created. Checking is done to be sure that 
duplicate document numbers do not exist and the files to be merged are in sequence. 

MAI NTS — Statistical Analysis of the Document Master File 

Abstract 

While the file maintenance procedures provide a count of the number of documents in 
the Document Master File, other statistics are needed. The listings of the Inverted File 
furnish a more detailed analysis of the use of the descriptors. MAINT8 provides an 
analysis of the Document Master File by bibliographic element. Included as part of the 
report is an exception listing of bibliographic entries not in agreement with the current 
list of authorized elements. 

Programmer's Note 

Valid bibliographic elements. MAINT8 is one of the few programs that checks element 
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number against a pre-programmed list rather than a lead card, that ts a program control 
card providing data required for processing. The authorized bibliographic elements have 
been changed since the program was first implemented. As a result, some error messages 
will appear that require no action by the user. (See Chapter X ( specifically the section on 
MAINT8.) If so desired, the list of element numbers could be revised by reassembling the 
program. Only minor programming modifications are required. 

MAINT9 — Subject the Document Master File 

Abstract 

With a continually increasing file size, it is sometimes necessary to work with a portion of 
the file - perhaps a class of documents or specific documents. MAINT9 allows the 
Document Master File to be subdivided into groups based on document number. 
Subdivision can be as broad as document class or as specific as individual document with 
many alternatives in between. The subdivided file can be used in exactly the same ways as 
the complete Document Master File. 

LIBPRT — Library Print of the Document Master File 
Abstract 

MAINT6 and LIBPRT are part of the same program. Both are designed to print the 
contents of the Document Master File; their functions vary in the amount of information 
printed and in format. MAINT6 provides a listing of the transactions to the Document 
Master File for any one update cycle while. LIBPRT provides a listing of the entire 
Document Master File. (See Genera f Manual , Chapter VI, specifically the File 
Maintenance Module Tasks and Figure 17.) When using LIBPRT, no provision is made for 
creating a Descriptor Input File. 

SORTD1 — Sort the Descriptor Input File 

Abstract 

Since the Descriptor Input File will update the contents of the Inverted File, the two files 
must be in the same sequence. The Descriptor Input File is created in document number 
sequence and sorted to descriptor and document number sequence. 

DMAIN1 — Update the Inverted- File 

Abstract 

The Inverted File will be updated with the contents of the Sorted Descriptor Input File. 
Because this file is created directly from the update of the Document Master File 
(MAINT4, MAINT6), the integrity of the Inverted File is assured. The Inverted File will 
be updated with the postings for each descriptor used, including document numbers, 
frequency count and dates of latest activity. 
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DMAIN2 — Print the Summary Listing of the Inverted File 

Abstract 

The Summary Listing of the descriptors reports on the terms used in document analysis. 
This listing is useful in formulating search requests, evaluating the indexing procedures 
and reviewing the Thesaurus File since it provides (by term) the term type, the frequency 
of usage, and the last date of activity on the Document Master File. 

Programmer's Notes 

Internal descriptors. UR BAN DOC has created several internal descriptors, descriptors 
included for specific retrieval needs but which hold little or no value for the external user. 
An example of this is the issue number of the Input Index in descriptor form. This is also 
true for the variation of the document number as a descriptor; since document number is 
always shown with the reference there would be no value to repeat it as a series of entries, 
one for each reference, in the printed Inverted File. 

Subsetting the Inverted File. An internal descriptor is a category of descriptor innovated 
by URBAN DOC and used primarily for document numbers in a descriptor form. These 
internal descriptors are identified by a pre-code of instead of a pre-code of (See 
Chapter VM, specifically Thesaurus Data Entry.) The purpose of creating this type of 
descriptor is that these terms can, and should, be eliminated from the Summary Listing. 
Inclusion of these terms would not increase the value of the report but rather 
unnecessarily lengthen it. 

DMAIN3 — Print a Detail Listing of the Inverted File 

Abstract 

The Detail Listing is a supplemental report to the Summary Listing, identifying the 
document numbers of the references to which the terms have been assigned. The Major 
Subject Listing in the Input Index provides a means of manually searching for the broad 
subject headings assigned to a document. The Detail Listing becomes significant since it is 
the only subject access to all the descriptors assigned to a document. 

Programmer's Notes 

Expansion of the report format. The user has the option of including the data presented 
in the Summary Listing as part of this report. Due to file organization and method of 
programming, frequency count cannot exceed one hundred and fifty per term. For terms 
with a greater frequency of usage, all postings will be shown but frequency count will 
appear as one hundred and fifty. For a correct frequency count for these terms; the user 
refer to the Summary Listing. 

Subsetting the Inverted File. The exclusion of the internal descriptors should also apply 
to this report. The presence of these terms would not increase the value of the report but 
rather unnecessarily lengthen it. 
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Tape File Specifications 

Docu-To-Tape File 

a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format: Single fixed -length record of 40 characters 



Field 


Size 


Identification 


X<4) 


Filler 


X(4) 


Label 


X(12) 


Filler 


X(10) 


Date 


X(6) 


Reel Number 


X(3) 


Data Record Format: 


Fixed-length records of 84 characters 


Blocking factor of 10 
Padding record of 9s 


Five data records: 




Header 
Descriptor 
Subdescriptor 
Bibliographic Data 
Subject Heading 

Header Format: 


Field 


Size 


Document Number 


X<14) 


Unit Number 


X|4> 


Entry Code 


X 


Entry Type 


X 


Filler 


X(2) 


Data Entry Date 


X(24) 


Filler 


X<38) 


Descrip tor Fo rmat: 


Field 


Size 


Document Number 


X(14| 


Unit Number 


X<4| 


Entry Code 


X 


Entry Type 


X 



Cols 


Contents 


1- 4 


'1HDR' 


5* 8 




9-20 


'DOCU-TO-TAPE' 


21-30 

31-36 

37-40 


mmddyy 



Cols Contem ' 

1*14 

15-18 

19 

20 

21-22 

23-46 'ENTRY DATE mm/yy 

47-84 



Cols Contents 

144 
15-18 

19 

20 
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Filler 


X(2) 


21-22 


Pre-code 


X 


23 


Descriptor 


X{35) 


24-58 


First Subdescriptor 


X(12) 


59-70 


Numerical Value 


X(10) 


71-80 


Filler 


X<4) 


81-84 


Subdescriptor Format: 


Field 


Size 


Cols 


Document Number 


X ( 1 4) 


1-14 


Unit Number 


X (4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Descriptor Tie 


X<2) 


21-22 


Subdescriptor Number 


X(2) 


23-24 


Descriptor Root 


X(6) 


25-30 


Filler 


X (4) 


31-34 


Subdescriptor 


X( 12) 


35-46 


Numerical Value 


X(10) 


47-56 


Filler 


X(2) 


57-58 


Subdescriptor 


X < 1 2) 


59-70 


Numerical Value 


X{10) 


71-80 


Filler 


X(4) 


81-84 



g. Bibliographic Data Format : 



Field 


Size 


Cols 


Document Number 


X ( 1 4) 


1-14 


Unit Number 


X(4) 


15-18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Element Number 


X<2) 


21-22 


Bibliographic Text 


X(58) 


23-80 


Filler 


X<4) 


81-84 


Subject Heading Format: 


Field 


Size 


Cols 


Document Number 


X { 1 4) 


1-14 


Unit Number 


X (4) 


15 18 


Entry Code 


X 


19 


Entry Type 


X 


20 


Element Number 


1 X(2) 


21-22 


Subject Code 


X 


23 


Subject Sequence 


X 


24 
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Used only for publications date 



Contents 



Used c nly for entry date 
or content dates 
Used only for entry date 
or content dates 

Used only fur content dates 
Used only for content dates 
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Subject Heading 
Filler 



X(35) 25-59 

X(25) 60-84 



i. Trailer Label Format : Single fixed-length record of 40 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


For a multi-reel file, only 


Filler 


X(36) 


5-40 


the last reel contains '1EOF'; 
all other reels contain '1EOR 



Look-Up Input File 
Sorted Look-Up Input File 

a. File Format: % 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format: Single fixed-length record of 40 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


'1 HDR' 


Filler 


X(4) 


5- 8 




Label 


X ( 1 3) 


9-21 


'LOOK-UP INPUT' 


Filler 


X ( 1 2 ) 


22-33 




Reel Number 


X(3) 


34-36 




Filler 


X (4) 


37-40 




Data Record Format: 








Fixed-length records of 56 characters 






Blocking factor of 20 records 
Padding records of 9s 








Field 


Size 


Cols 




Document Number 


X(14) 


1-14 




Unit Number 


X (4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Pre-code 


X 


21 




Term 


X(23). 


22-44 




Filler 


X(12) 


45-56 




Trailer Label Format: Single fixed-length record of 40 characters 


Field 


Size 


Cots 


Contents 


Identification 


X (4) 


1- 4 


For a multi-reel file, only 
the last reel contains '1FOF'; 
al l other reels contain ' 1 LO R 


Filler 


X(36) 


5-40 
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Bad Look-Up File 
Sorted Bad Look-Up File 



a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format: Single fixed-length record of 40 characters 



Field 


Size 


Cols 


Contents 


Identification 


X<4) 


1- 4 


'1HDR' 


Filler 


X<4! 


5- 8 




Label 


X{1 1) 


9-19 


'BAD LOOK-UP' 


Filler 


X(11) 


20*30 




Date 


X(6) 


31-36 


mmddyy 


Filler 


X 


37 




Reel Number 


X<3) 


38-40 




Record Format: Unblocked 


fixed-length 


records of 59 characters. 


Field 


Size 


Cols 




Document Number 


X( 1 4) 


1-14 




Unit Number 


X(4) 


15-18 




Entry Code 


X 


19 




Entry Type 


X 


20 




Pre-code 


X 


21 




Term 


X(35) 


22-56 




Error Code 


X(3) 


57-59 





d. Trailer Label Format: Single fixed-length record of 40 characters. 



Field 


Size 


Cols 


Contents 


Identification 


X (4) 


1- 4 


For a multi-reel file, only 


Filler 


X \36> 


5-40 


the last reel contains '1EOF'; 
all other reels contain '1EOR' 



Edit File 



a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format: Single fixed-length record of 24 characters. 



Size 

X(15) 




^ I 



Field 

Label 



48 



Cols 

1-15 



Contents 

'EDIT-TAPE -PASS1' 
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Reel Number 


X<3) 


16-18 




Date 


X(6} 


19-24 


mmddyy 



c. Data Record Format : Unblocked variable-length records, maximum of 161 characters 



Field 


Size 


Cols 


Contents 


Document Number 


X(14) 


1- 14 




Sequence Number 


X<3) 


15- 17 




Entry Code 


X 


18 




Type Code 


X 


19 




Revision Data 


X(1 1) 


20- 30 


Used only for Document Master 
Revisions 


Filler 


X(2) 


31- 32 




Input Unit Number 


XM) 


33- 36 




Text Size 


X (3) 


37- 39 




Text 


X(text 

size) 


40-161 


Variable in length, maximum of 
122 characters 


Trailer Label Format: Single fixed length record of 24 characters 


Field 


Size 


Cols 


Contents 


Identification 


X(4) 


1- 4 


For a multi-reel file, only 
the last reel contains '1EOF'; 
all other reels contain '1EOR\ 


Filler 


X(20) 


5-24 





Document Master File 
Document Subset File 

a. Fife Format : 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Label Format : Single fixed -length record of 24 characters 



Field 


Size 


Cols 


Contents 


Label 


X( 1 2) 


1-12 


'docu/ma: 


Filler 


X(3( 


13-15 




Reel Number 


X(3) 


16-18 




Date 


X(6) 


19-24 


mmddyy 



TERS' 



c. Data Record Format: Unblocked variable-length records, maximum length of 2200 characters. 

See also the discussion of the Document Master File 

Two data records: 

Searchable Record 
Free Text Record 




bo 
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d. Searchable Record Format: Unblocked variable-length record, maximum length of 2200 characters 



Fi p !d 


Size 


Cols 


Contents 


Document Number 


X(14) 


1-14 




Element Number 


X(3) 


15-17 




Data Entry Date 


X(24) 


18-41 


"ENTRY DATE mm/yy 


Filler 


X(14) 


42-55 




Pointer Table 
Searchable Data 


X(3)+NX(1 1 ) 




N = number of descriptors 
assigned. See the discussion of 
the Document Master File. 
Variable in length to 2200 
characters. 


Text Record Format: Unblocked variable-length records, maximum length of 2200 characters 


Field 


Size 


Cots 


Contents 


Document Number 


X04) 


1-14 




Element Number 


X<3) 


15-17 




Data Entry Date 


X<24) 


18-41 


'ENTRY DATE mm/yy 


Filler 

Free Text Data 


X { 1 4) 


42-55 


See the discussions of the 
Document Master File. Variable 
in length to 2200 characters. 



f. Trailer Label Format: Single fixed-length mcord of 24 characters 



Field 


Size 


Cols 


Contents 


Identification 


X (3) 


1- 3 


For a multi-reel file, only 


Filler 


X(21 / 


4-24 


the last reel contains 'EOF'; 
all other reels contain 'EOR 



Print File 

a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 



b. Header Label Format: Single fixed-length record of 24 characters 



Field 


Size 


Cols 


Contents 


Label 


X(10) 


1-10 


'PRINT TAPE' 


Filler 


X(5) 


11-15 




Reel Number 


X<3) 


16-18 




Date 


X<6) 


19-24 


mmddyy 
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c, Data Record Format: 

Three data records: 

Descriptor Transaction 
Searchable Record 
Free Text Record 

d, Descriptor Transaptign Format: Unblocked fixed- length records of 43 characters 



Field 


Size 


Cols 


Transaction Code 


X (2) 


1- 2 


Pre-code 


X 


3 


Descriptor 


X(23) 


4-26 


Document Number 


X(1 4) 


27-40 


Element Number 


X(2) 


41-42 


Record Mark 


X 


43 



e, Searchable Record Format: Unblocked variable-length record, maximum length of 2200 characters. 

See also discussion of the Document Master File 



Field 


Size 


Cols 


Contents 


Entry Code 


X(2) 


1- 2 




Document Number 


X ( 1 4) 


3-16 




Element Number 


X(3) 


17-19 




Data Entry Date 


X ( 24} 


20-43 


'ENTRY DATE mm/yy 


Filler 


X ( 1 4) 


44-57 




Pointer Table 


X(3)+NX(1 1 ) 




N = number of descriptors. See 
the discussion of the Document 
Master File. 


Searchable Data 






Variable in length to 2200 
characters. 



f. Free Text Record Format: Unblocked variable-length record, maximum length t f 2200 characters. 

See also the discussion of the Document Master File 



Field 


Size 


Cols 


Contents 


Document Number 


X<14) 


1-14 




Element Number 


X<3) 


15-17 




Data Entry Date 


X{24) 


18-41 


'ENTRY DATE mm/yy 


Filler 


X{ 1 4) 


42-55 





Free Text Data See the discussion of the Docu- 

ment Master File. Variable in 
length to 2200 characters. 



g. Trailer Label Format: Single fixed-length record of 24 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(3) 


1- 3 


For a multi-reel file, only 


Filler 


X (2 1 ) 


4-24 


the last reel contains '1 EOF 1 ; 
all other reels contain '1EOR 
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Descriptor Input File 
Sorted Descriptor Input File 

a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header^Labei Format : Single fixed-length record of 41 characters 



Field 


Size 


Cols 


Contents 


Label 


X ( 1 9 ) 


1-19 


'DESCRIPTOR-PASS 001' 


Date 


X (6) 


20-25 


mmddyy 


Filler 


X ( 1 6) 


26-41 




Data Record Format: 








Fixed-length records of 42 characters 






Blocking factor of 4 
Padding records of 9s 








Field 


Size 


Cols 




Entry Code 


X 


1 




Filler 


X 


2 




Descriptor Type 


X 


3 




Descriptor 


X (23) 


4-26 




Document Number 


X ( 1 4) 


27-40 




Filler 


X{2) 


41-42 




Trailer Label Format : Single fixed-length record 


of 41 characters 


Field 


Size 


Cols 


Contents 


Identification 


X (4) 


1- 4 


For a multi-reel file, only 
the last reel contains '1 EOF'; 
all other reels contain '1E0R\ 


Filler 


X(37) 


5-41 





Inverted File 



a. File Format: 

Header label, tape mark 
Data records, tape mark 
Trailer label, tape mark 

b. Header Labe I Format: Single fixed -length record of 41 characters 



Field 

Label 

Reel Number 



Size Cols Contents 

X(15) 1-15 'DESCRIPTOR-FILE' 

X{3) 16-18 
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Filler 


X(2) 


19-20 




Date Created 


X(12) 


21-32 


'DATE CREATED' 


Filler 


X 


33 




Date 


X<8) 


34-41 


mm/dd/yy 



c. Data Record F* •’mat: Unblocked variable-length records, maximum length of 2157 characters 



Field 


Size 


Cols 


Contents 


Descriptor 


X(23) 


1-23 




Filler 


X 


24 




Descriptor Type 


X 


25 




Frequency Count 


X(5) 


26-30 




Date of Last Addition 


X(6) 


31-36 


mmddyy 


Date of Last Deletion 


X (6) 


37-42 


mmddyy 


Highest Document Number 
for Descriptor 


X ( 1 4) 


43-56 


Blank for common descriptor 


Record Mark 


X 


57 




Document Number 


NX(14) 


58-end 


N « number of documents. 
Occurs maximum of 1 50 times 



only for precise or internal 
descriptors 

d. Trailer Label Format: Single fixed-length record of 24 characters 



Field 


Size 


Cols 


Contents 


Identification 


X(3) 


1- 3 


For a multi-reel file, only 


Filler 


X (2 1 ) 


4-24 


the last reel contains 'EOF'; 
all other reels contain 'EOR 



References to the URBANDOC Final Report 

Much of the information presented in this chapter is designed to be used with sections 
of the Genera I Manual {G.M.) and other sections of the Operations Manual {O.M.) 

For additional information on the Document Master File and the Inverted File, their 
formats and the considerations in creating and maintaining entries on the files, see: 



Manual 


Chapter 


Section 


G.M. 


1 — Introduction 


The Bibliographic Records 


G.M. 


II — Document Identification 


’JRBANDOC Document 
Numbers: General 


G.M. 


III — Document Analysis: 
Descriptive 


General Considerations 


G.M. 


IV — Document Analysis: 
Content 


General Considerations 


G.M. 


VI — Systems Modules: Input 


File Maintenance Module: 
Document Master File 
Description, Inverted 
File Description 



o 
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For additional information on the design and goals of the File Maintenance Module, see: 



Manual Chapter Section 

G.M. VI - Systems Modules: Input File Maintenance Module: 

Function, Tasks 

For additional information on operating this portion of the system, see: 



Manual 


Chapter 


Section 


O.M. 


VIII — Processing Cycles 


Editing and Validation Cycle, 
Input Processing Cycle, 
Retrieval Report File 
Maintenance Cycle, 
Miscellaneous Publications 
Products 



IX - Operating Instructions File Maintenance Module 

XI — Tape Library and Report File Maintenance Module 

Controls 

XU— Timing File Maintenance Module 



O.M. 

O.M. 

O.M. 
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SEARCH MODULE 
Introduction 

The Search Module in the URBANDQC system is a composite of two separate 
subsystems: the CFS Search Module and the Enginneering Index Search Expansion 
Programs (SEP). These two have been interfaced together to form a processing whole. 
Originally, only the CFS programs were used. The Engineering Index programs were 
subsequently added. 

Knowledge of the basic concept underlying the Combined File Search process is 
useful in understanding the search run. The Combined File Search uses both a 
Descriptor File and the Master File in the search process. This process is based upon 
the following principle: 

A search request contains a number of descriptors. If we take any one of these 
descriptors . . . and obtain the list of items posted under that descriptor in the 
Descriptor File, then we have immediately reduced the possible responses to the 
request from the entire Master File to that limited item list. It is then possible to go 
to the Master File and perform a detailed search on the selected items. Thus, the 
actual search is performed on the Master File; the Descriptor File is only a tool for 
reducing the scope of the search. 1 (See Figure 5.) 

The search run consists of all processing necessary to produce the output requested 
by the user of the system. The process begins with the processing of a card file 
containing the users' reqi ^ts and terminates when the documents selected by the 
search have been printed out. This process is accomplished in six phases. (See 
Figure 6.) 

Phase Function 

1 Card-to-tape and edit the request card file 

2 Descriptor File search 

3 Sort output from Phase 2 into item number sequence 

4 Associate sorted item numbers from Phase 3 with full request terms 

5 Master File Search 

6 Print search run output. 2 

Experience with the released search portion of the CFS system revealed three significant 
limitations: 

The format of the printed search output was generally unacceptable for widespread 
usage, especially without annotation; 

The formation of complex search expressions was cumbersome to the analyst. On 



International Business Machines, Data Processing Division, 1401 Information Storage Retrieval 
System (The Combined File Search System ) by Donald Prentice, Gary deGraw, Alice Smith and I. 
Albert Warheit, Rev.' Ill, 1 v. (White Plains, N. Y., 1966), p3.01 . 

2 Ibid., p 3.04. 
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occasion the resulting search expressions did not agree with the conceptualized 
definition of the search request; 

The sequencing of criteria within the search terms was the responsibility of the 
document analyst. Usually the formulated search expression did not run at 
maximum search efficiency, i.e. the terms were not ordered with the least hequent 
terms first. (A joint review of the request by the document analyst and systems 
analyst would have resolved this problem but would have been costly in terms of 
man-time, } 

URBAN DOC became aware of a series of programs called the Search Expansion Program 
(SEP) designed and implemented by the Engineering Index specifically to overcome the 
above limitations of the CFS system. URBANDOC acquired these programs and 
interfaced them with the existing CFS system. (See Chapter 1, specifically Systems 
Background.) 

The Search Expansion Programs (SEP) were added to the CFS Search Programs to reduce 
the amount of time spent in writing a request, to minimize the time required to perform a 
search and to produce the search output in a more acceptable format. With CFS, the 
document analyst was required to work with a cumbersome data entry procedure. This 
was particularly true when one term was used more than once in a request, for example 
A*{B+C) which became A*B+A*C in the CFS format. The SEP programs would now 
perform the expansion. The search time could be minimized by placing the least used 
terms at the beginning of the request. More importantly, the SEP programs relieved the 
document analyst of the burden of manually sequencing the term. These points will be 
discussed in greater detail in the Narrative (next section). 

The SEP system consists of two groups of programs. One group is a set of diagnostic 
programs, designed to detect potential processing problems in the CFS search prior to the 
actual search. The other is a series of formatting and print programs designed to replace 
their corresponding counterparts in CFS. 

The search output is the documents meeting the search requirements (the hits) listed for 
each request. The entry in the Retrieval Report will consist of the title information from 
the input request plus the entire record for that document in the Document Master File: 
descriptors, authors, title, imprint and collation, subject headings, etc. The term 
" Retrieval Report " is an URBANDOC label for the printed search output; it is not ? 
specific part of the CFS or the Engineering Index systems. 

In its current form, a search request may be processed either as part of the Expanded 
Search (including the use of the SEP system) or as a Basic Search (using only the CFS 
portion of the search). Depending upon the search method selected, the corresponding 
form for Search Data Entry will be used. (This will be discussed in greater detail in the 
following section on Narrative. See Chapter VII, specifically Search Data Entry.) 

The following section on Narrative provides a more detailed discussion of the actual 
processing techniques used in the composite search system. If the reader only wishes an 
overview of the steps involved in processing a search request, see Program Abstracts and 
omit the Narrative. 
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CONCEPT OF THE COMBINED FILE SEARCH METHOD 3 



NOTE; THIS IS MEANT TO PRESENT 

THE CONCEPT OF THE SEARCH 
METHOD. IT IS NOT INTENDED 
TO REPRESENT ACTUAL 
PROGRAMMING PROCESSES , 



©PERFORM DETAIL SEARCH 
WITH FULL REOUEST ON 
POTENTIAL ITEMS ONLY 




PRINT INFORMATION 
ON SELECTED ITEMS 



3 Ibid., p 3.02. 



Figure 5 
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LOGICAL FLOW OF COMBINED FILE SEARCH PROCESS 4 




4 Ibid., p 3.03. 

Figure 6 
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Narrative 

The General Manual described the search process as one total entity. However, a detailed 
examination of the Search Module would reveal that it is a series of subsystems, 
separately developed by different organizations, which have been interfaced to form a 
whole. 

in order to discuss the Search Module to the depth that it warrants URBANDOC has 
inserted parts of narrative from the author organization documentation. UR BAN DOC 
comments and explanations are inserted for a smooth transition from one topic to another. 

Three types of searches may be specified by the user of this system, DOCUMENT, 
BOOLEAN, or MIXED. 

Document Search. Purpose: To retrieve the information pertaining to a particular 
document or series of documents. Input Data Required: List of document numbers 
for which information is to be retrieved. 

Boolean Search. Purpose: To retrieve document numbers and information related 
to an object which may be described as using a Boolean statement. Input 
Data Required: A list of criteria which describes the object worded in a Boolean 
statement. 

Mixed Search. Purpose: To retrieve information related to a specific document or 
series of documents provided they conform to a given set of criteria. The Mixed 
Search is a combination of the Document Search and the Boolean Search. Input 
Data Required: A list of documents to be examined and the descriptors which 
describe the criteria for which the documents are to be examined . 5 

URBANDOC usually performs a Boolean or Descriptor Search. For this reason, the 
Document Search or Mixed Search is not discussed at length in the URBANDOC Manuals. 
Both the Boolean Search and the Mixed Search use the Inverted File for an inverted table 
look-up followed by a linear search of the Document Master File. The Document Search 
does not perform the inverted table look-up but is a straight linear search of the 
Document Master File. 

Three data files are required in the search process: Request File, Descriptor File, 
Master File. 

Request File: This file is a card input batch of assembled search requests, each 
request containing the criteria for search and retrieval as specified by the requestor. 
(Note: Here the term, Request File, is the file called the Basic Search Input in the 
Operations Manual . ) 

Descriptor File: The lile is a series of item, or documents, each of which contains 
all information indexed for the particular item. 



S Ibid., p 3.21 
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